CPSC 545/445 (Autumn 2003) - Class 22: DNA-based Computing (2) Module 7, Part 3 --- [brief recap of DNA-based computation, adleman's experiment] Note: - main advantage over conventional computation: massively parallel, molecular scale - entire procedure required 7 person-days of lab work (most time-consuming step: Step 4) - number of different oligos is linear in size of graph - quantity needed in step 1 = exponential in size of graph (but this can be somewhat improved using a better algorithm for HPP) -- Surface-based approach (Smith et al., 1998): key idea: bind DNA to surface (as in microarrays, glass or gold), perform computation steps by exposing surface-bound DNA to various mixtures of DNA in solution, restriction enzymes, etc. advantages over solution-based approach: - minimal loss of DNA between steps (in solution-based approach: loss trough residual liquid in test-tube) - limited potential for (undesired) interaction between surface bound DNAs - purification simply consists of "washing" surface (in solution-based approach: electrophoresis - more work!) - can use advanced microarray technology for creating / working with surface-bound DNA A simple model for surfaced-based DNA computation (Liu et al., 1996) operations: - mark(i,b): marks all strings with bit i = b in a given set S - multi-mark((i1,b1), ..., (ik,bk)): same, but marks simultaneously based on values of bits i1, ..., ik - destroy-marked: removes all marked strings from set S - destroy-unmarked: removes all unmarked strings from set S - unmark: unmarks all strings in set S - test-if-empty: determines whether set S is empty These operations are sufficient for solving the propositional SAT problem (and hence any NP-complete problem). SAT: given: propositional formula F in conjunctive normal form, i.e., conjunction of clauses, where each clause is a disjunction over variables and their negations. goal: decide whether there is an assignment of truth values (0,1 = false,true) to variables in F under which F is true. DNA-SAT algorithm: strings represent truth assignments of variables in given formula for each clause C do for each unnegated variable x_i in C do mark(i,1) for each negated variable x_i in C do mark(i,0) (note: at this point, all assignments that satisfy C are marked) destroy-unmarked unmark test-if-empty Implementation: representation of candidate solutions (assignments): DNA oligomers, one base per bit (using A,G for 0 and T,C for 1, 50% GC content can be achieved) alternative: short DNA sequence (length > 1) per "atomic assignment" need additional flanking 5' and 3' primer sequences for PCR amplification mark(i,b): introduce complementary sequences to all elements in S for which bit i has value b -> marked strands are double stranded destroy-marked: use Exonuclease I to degrade single-stranded oligos unmark: wash surface with distilled water (destabilises double stranded DNA) test-if-empty: PCR amplify strands this method has been used to solve a 5-variable, single solution SAT problem problems: - lower surface hybridisation efficiency (compared to solution-based approach) - slower surface enzyme kinetics (compared to solution-based approach) - real-estate on 2-dimensional surface limits scale of computation -> increased dimensions or density, or utilise 3d geometries - specific hybridisation becomes more problematic for longer oligos (larger problem instances), incorrect hybridisation needs to be controlled more carefully. -> DNA code design -- Fundamental issues: DNA code design - construct set S of DNA words (=oligos) such that: 1) desired interactions between word of their complements are uniformly strong 2) undesired interactions between words, words and complements of other words, or complements of words are avoided -> unary and binary constraints on words in S Examples for such constraints: To achieve (1): - require uniform GC content for all words in S (unary constraint) - require that words in S and their complements have small range of free energy melting temperatures (unary constraint) - require that words in S don't form secondary structure (unary constraint) To achieve (2): - require minimum Hamming distance between any two words in S (binary constraint) - require gap between minimum free energy (or melting temperature) for duplexes between any two words in S and maximum free energy (or melting temperature) for words and their complements Other constraints arising in the context of using sequences of DNA words (equivalent to bitstrings or vectors in standard computation): - no undesired interactions across junctions (example) - arbitrary strings over words from S don't form secondary structure Finding sets of DNA words satisfying such constraints is a hard combinatorial problem -> state-of-the-art algorithms by Tulpan, Andronescu, Hoos, and Condon Note: Certain DNA word design problems are closely related to problems of designing error-correcting codes in coding theory. -- 7.4 Self-assembly "computations" [brief introduction only] Idea: construct DNA (or RNA) molecules that assemble themselves into 2d or 3d structures Purpose: Construction of nano-devices (mechanical or eletrical), perform computation Implementation: Use multiple crossover structures and sticky ends Examples: - Seeman's Double Crossover DNA Arrays -> http://seemanlab4.chem.nyu.edu/DX.arrays.html - Seeman's cube and truncated octahedron -> http://seemanlab4.chem.nyu.edu/nano-cube.html -> http://seemanlab4.chem.nyu.edu/nano-oct.html - Seeman's B-Z device (a nanomechanical switch with two conformational states, based on B- and Z-conformation of DNA double helix, triggered by addition and removal to a chemical - Hexaamminecobalt(III) chloride) -> http://seemanlab4.chem.nyu.edu/BZ.Device.html DNA based self-assembly computation -> http://seemanlab4.chem.nyu.edu/XOR.html Idea: computation performed by a Turing machine can be encoded in mosaics of Wang tiles (adjacent tiles must match, each row of the mosaic corresponds to a configuration of the TM) Implementation: Realise Wang tiles using branched DNA molecules with sticky ends (the latter correspond to the four sides of a Wang tile and ensure matches between adjacent tiles) -> applications in algorithmic assembly Note: Assembly is massively parallel, but problems can arise if (incorrect) late stages of computation assemble that don't match up with earlier parts (unreachable configurations) Note: Many issues similar to the ones discussed for solution-based and surface based DNA computation arise. -> work by Ned Seeman / Eric Winfree --- resources: DNA-based computation: L.Adleman: Molecular Computation of Solutions to Combinatorial Problems. Science 266(5187):1021-1024 DNA Computer. http://people.cornell.edu/pages/zf25/dna_computer.html Lagally Research Group. http://mrgcvd.engr.wisc.edu/lagallygroup/research/DNA/DNA_Computation.htm Ned Seeman's Lab. http://seemanlab4.chem.nyu.edu/homepage.html